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Abstract: Analyses of learning based on student discourse need to account not only for the 
content of the utterances but also for the ways in which students make connections across turns 
of talk. This requires segmentation of discourse data to define when connections are likely to be 
meaningful. In this paper, we present an approach to segmenting data for the purposes of 
modelling connections in discourse using epistemic network analysis. Specifically, we use 
epistemic network analysis to model connections in student discourse using a temporal 
segmentation method adapted from recent work in the learning sciences. We compare the 
results of this study to a purely conversation-based segmentation method to examine the 
affordances of temporal segmentation for modelling connections in discourse. 
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NOTES FOR PRACTICE 

• When analyzing learning based on student discourse we need to account not only for the 
content of student talk but also the ways in which students make connections within a 
conversation. However, this requires segmentation of discourse data to define when 
connections are likely to be meaningful. 

• This methods paper uses Epistemic Network Analysis to understand how connections are 
modeled based on the conversation method, which models connections within an entire 
activity, and the moving stanza window method, which models connections within a 
conversation by dividing the activity into multiple overlapping stanzas 

• An important benefit of the moving stanza window method is that it models the role of 
individual contributions to group discussions. By using a sliding window of fixed size to 
establish the analytic context, researchers can create models of discourse that update 
with each new contribution to the conversation. 

• Many CSCL environments already include integrated feedback and assessment; however, 
the ability to use the moving stanza window method to model individual contributions to 
group discussions in a chat's recent temporal context would allow teachers the ability to 
assess real-time student performance in online environments. 


1 INTRODUCTION 

Analyzing high-volume discourse data is a challenge in computer-supported collaborative learning (CSCL) 
environments because student conversations in such environments are characterized not only by what 
is said but by patterns of language use within social practices (Gee, 1990). This suggests that analyses of 
learning based on student discourse need to account not only for the content of the utterances but also 
for the ways in which students make connections across turns of talk. Any analysis of such connections, 
however, requires segmentation of discourse data to identify the conditions under which connections 
are likely to be meaningful (Hearst, 1994). In this paper, we present an approach to segmenting data for 
the purposes of modelling connections in discourse. Specifically, we use epistemic network analysis 
(Shaffer et al., 2009) to model connections in student discourse using a temporal segmentation method 
adapted from recent work in the learning sciences (Dyke, Kumar, Ai, & Rose, 2012; Suthers & Desiato, 
2012). We compare the results to a conversation-based segmentation method to examine the 
affordances of temporal segmentation for modelling connections in discourse. 

2 THEORY 

There are a number of theoretical perspectives in the learning sciences that describe one's 
understanding of a topic, process, domain, or practice in terms of the structure of understanding; that is, 
the way concepts, skills, and habits of mind are related to one another systematically. Chi, Feltovich, and 
Glaser (1981), for example, found that experts in physics organize their understanding differently than 
novices. Bransford, Brown, and Cocking (1999) showed that the organization of experts' content 
knowledge reflects their deep understanding of subject matter. DiSessa (1988) suggests that that while 
solving physics problems requires understanding basic concepts from the discipline, deep and 
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systematic understanding comes from linking such concepts to one another within a theoretical 
framework. Similarly, Shaffer (2012) characterizes learning as the development of an epistemic frame: a 
pattern of associations among knowledge, skills, habits of mind, and other cognitive elements that 
characterizes communities of practice, or groups of people who share similar ways of framing, 
investigating, and solving complex problems. 

Not surprisingly, research on discourse processing suggests that connections among concepts are made 
primarily on a topic-by-topic basis rather than across discourse as a whole. For example, Gernsbacher's 
(1991; see also Graesser, Gernsbacher, & Goldman, 1997) theory of language processing suggests that 
students use hierarchical organization of content to build understanding. Discourse is structured by 
topic, with concepts having clear relationships to one another within topics and few relationships across 
topics. 

Similarly, epistemic network analysis (ENA) analyzes the structure of connections in student discourse by 
looking at the co-occurrence of concepts within the conversations, topics, or activities that take place 
during learning. Building on the idea of learning as the development of an epistemic frame, ENA creates 
a discourse network model of thinking by identifying the co-occurrence of skills, knowledge, values, and 
other elements of work in a particular community of practice (Shaffer et al., 2009). The co-occurrences 
are identified within collections of related utterances, which are nested within activities, a fundamental 
unit of analysis in ENA. Prior work by Collier, Ruis, and Shaffer (2016) has shown that analyzing 
connections within activities is a more sensitive measure than analyzing correlations of ideas in a corpus 
of data overall, and a number of studies (Arastoopour, Swiecki, Chesler, & Shaffer, 2015; Chesler et al., 
2015; Knight, Arastoopour, Shaffer, Shum, & Littleton, 2014) have used ENA to analyze student learning 
at the activity level. 

There are, however, two problems with such an approach. First, as Stahl, Koschmann, and Suthers 
(2006) argue, learning needs to be analyzed at both the group and the individual level. Stahl (2009), for 
example, conducted parallel qualitative analyses of the mathematics learning of a group and of the 
individuals in the group. But as Cress and Hesse (2013) point out, because learners work in groups, 
simple t-tests and ANOVAs do not effectively model the influence that groupmates have on one 
another. Thus, creating a quantitative model of group discourse that accounts for the contributions of 
any single individual within the group discussion remains a challenge. 

A second problem is that the aggregation of connections using the entire activity may incorrectly 
connect ideas that are in fact not within the same context (Arvaja, Salovaara, Hakkinen, & Jarvela, 2007). 
While ideas are surely connected within conversations or activities, such connected ideas are most likely 
to occur in close temporal proximity. During discussions, students simultaneously build group and 
individual understanding by "saying" and replying to "what is said" (Wells, 1999). Speech typically 
addresses another instance of speech and anticipates a response (Bakhtin, 1986). Because "thinking and 
speech are, in this sense, always derivative of prior thinking and speech" (Smagorinsky, 2011, p. 23), 
students build on the ideas of their team members to mediate their discussion of concepts. Therefore, 
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to measure connections in conversations, we need a method to model connection-making on shorter 
time scales than entire activities. 

Recent work by Dyke and colleagues (2012) and Suthers and Desiato (2012) proposes using sliding 
window analyses to model temporal connections in discourse within their recent temporal context. 
Rather than creating summary values for all utterances in an activity, a sliding window can analyze 
recent temporal context by computing a value for a smaller section of an activity — typically a small 
amount of time (e.g., 10 seconds) or a small number of utterances (e.g., three turns of talk; Dyke et al., 
2012). The window is sliding in the sense that a summary value is computed for each utterance, based 
on the preceding lines of talk (e.g., the preceding 10 seconds or three lines of talk). Other forms of 
sliding window analyses have been used to identify shifts in topic (Rose et al., 2008), visualize semantic 
similarities between utterances (e.g., PolyCAFe; Trausan-Matu, Dascalu, & Rebedea, 2014), and more 
generally to provide new insights on previously analyzed data (Dyke et al., 2012). By analyzing discourse 
in smaller segments that are temporally related, a sliding window approach is less likely to take an 
utterance out of context than an approach that examines connections across an entire activity. 

Although sliding windows measure discourse on small time scales, sliding windows alone do not 
measure connections among codes nor do they address how people collaboratively co-construct 
knowledge. To measure connections between ideas, Suthers and Desiato (2012) proposed measuring 
uptake — modelling structures of connections that show when participants refer to prior events and 
how such references help continue conversation. However, while Suthers and Desiato's model showed 
when each actor used another actor's contribution, this model only showed whether a connection was 
made, not what connection was made nor the semantic structure of connections. 

In what follows, we model the semantic structure of connections in discourse and use ideas from Shaffer 
(2017) that build on Gee's (1990) work to create an ENA model using a moving window approach. When 
analyzing discourse, first we identify the smallest unit of analysis as a single line, which in CSCL discourse 
is often a turn of talk. After designating lines, we group these lines together into conversations, which 
are the set of all lines from a single team during a single activity. For instance, all chat utterances in a 
CSCL environment may be designated as a line and then grouped by each activity in that environment 
into a conversation. By segmenting data into a conversation, we assume that all lines within that 
conversation are equally related, when they may not be. Therefore, within conversations we can define 
stanzas, which are a set of related lines within that conversation. Gee argues that single lines or 
utterances in talk are grouped together into sets of related lines called stanzas. The analogy is to stanzas 
in a poem, in which lines are related within stanzas, and within a poem, which could be considered a 
conversation, but not across poems. Using this idea, ENA can model the co-occurrence of ideas by 
conversations or by stanzas within conversations. 

In this study, we use the idea of conversations and stanzas to delineate two different approaches to 
modelling connections using ENA. In both cases, ENA models connections among concepts: 1) by 
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identifying a conversation as an entire activity; and 2) by identifying stanzas as collections of utterances 
within conversations. Specifically, they are as follows: 

1. The Conversation 1 Method models connections within an entire activity; that is, all the 
utterances within an activity are related to one another. Or, equivalently, each activity is 
composed of a single stanza. 

2. The Moving Stanza Window Method models connections within a conversation by dividing the 
activity into multiple overlapping stanzas; that is, utterances are related to one another only 
within some designated stanza window. Thus, the moving stanza window method models 
connections only when utterances are in close temporal proximity within an activity. 

In what follows, we compare the two ENA segmentation methods by looking at data from a CSCL 
learning environment in which students collaboratively design solutions to engineering problems. To 
evaluate the strengths and limitations of the two approaches to segmentation, we created ENA models 
using both the conversation method and the moving stanza window method. In this study, we focus on 
the discourse of one representative team and ask: 

Does the moving stanza window method provide information about group discourse that the 
conversation method does not? 

3 METHODS 

3.1 The Engineering Virtual Internship RescuShell 

RescuShell is a 10-week long engineering virtual internship, in which students roleplay as engineering 
interns at a fictional mechanical engineering design firm working to develop robotic legs for a 
mechanical exoskeleton for use by rescue personnel. Students use an online work portal with email and 
an instant messaging chat window to engage in 17 different activities that simulate various steps in the 
design process, including reviewing and summarizing research reports, creating device prototypes, 
discussing design choices with teammates, and working to balance the needs of various internal 
consultants and external clients. During these activities, students research how each of the five internal 
consultants in RescuShell prioritize two performance parameters and request specific threshold values 
for each of these parameters. For example, the biomedical engineer prefers a device with high agility 
and high safety, while the environmental engineer prefers a device with a high recharge interval and a 
low cost. Students try to meet the internal consultants' requests by exploring how various technical 
constraints (e.g., actuators, powers sources, range of motion, sensors, and materials) affect the 
performance parameters. However, each of the internal consultant's concerns are in conflict with one 
another (e.g., as recharge interval decreases, cost also increases). Therefore, students must balance 

1 In other writings, we have referred to the conversation method as the strophe or topic method. For this analysis, we have 
simplified the language to conversation to reflect that we separated the discourse based on entire conversations about a topic 
or activity. 
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client and consultant requests and justify their design decisions when designing and testing exoskeleton 
prototypes. 


In this study, we focused on the first eleven activities of the internship, during which students were 
randomly assigned to one of five teams, each of which explored the use of a particular actuator in the 
exoskeleton design (hydraulic, PAM, electric, pneumatic, or series elastic). Forty-four first-year 
engineering students participated in the virtual internship, which took approximately 15 hours to 
complete. From this sample, we selected one representative team from the broader sample and 
analyzed how these five students (4 male, 1 female) discussed the design problem in the first half of the 
internship. 

3.2 Discourse Analyses 

3.2.1 Coding student chats 

We collected chat log data from teams and segmented by utterance, defined as when a student sent a 
single message in the chat program. We developed a set of codes to represent the key elements the 
engineering design process (see Table 1). 


Code Name 

Design 

Reasoning 


Performance 

Parameters 

Technical 

Constraints 

Client and 

Consultant 

Requests 

Collaboration 


Data 


Table 1. Engineering Design Coding Scheme 


Description 


Example 


Referring to design development, 
prioritization, trade-offs, and 
design decisions 

Referring to attributes: payload, 
recharge interval, agility, safety, 
or cost. 

Referring to inputs: actuators, 
ROM, materials, power sources, 
or sensors. 

Referring to or justifying 

decisions based on internal 

consultant's requests or client's 

health or comfort 

Facilitating a joint meeting or the 

production of team design 

products. 

Referring to or justifying 
decisions based on numerical 
values, results tables, graphs, 
research papers, or relative 
quantities. 


"Aluminum and Composite are good 
options. Steel can carry a big load, but it is 
heavy and weighs down on the recharge 
interval, and it is a costly option." 

"My device has a pretty good safety, 
payload, agility, and recharge interval; the 
cost is a little high though." 

"Our two best were both made with 
Aluminum, NiCd Batteries, Piezoelectric 
sensors, and Pneumatic actuators." 

"We tried to meet at least the minimum of 
each of the internal consultant's requests." 


"Flow should we make our team batch?" 


"I thought that safety near the maximum 
was not very good (close to 225 - one had 
218 RPN), but other than that, I was fine 
with the safety as long as it was around 200 
or lower." 
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Because the chat data had a high volume of data (3824 utterances), we applied the coding scheme to 
each utterance using an automated coding process that uses keywords for regular expression matching 
(Shaffer et a I2015; Arastoopour et al., 2015). We validated all six codes using a series of comparisons 
between two human raters and the computer with resulting Cohen's kappa scores between 0.83 and 
1.00 (see Table 2). The interrater reliability analysis shows that all pairwise agreements among rater 1, 
rater 2, and the computer meet standards for kappa (Landis & Koch, 1977). We used a Monte Carlo 
rejection technique, Shaffer's rho, to determine for each kappa value the likelihood that it would be 
found by two coders if their the true rate of agreement was less than kappa of 0.65 (Shaffer et al., 2015). 
As shown in Table 2 below, all of the kappa values achieved have Shaffer's rho values less than 0.05, 
meaning that the Type I error rate for assuming that if the coders were to code the whole data set they 
would have a level of agreement over kappa of 0.65. 


Table 2. Interrater Reliability Analysis between Two Raters and an Automated Coding Scheme 


Code Name 

Kappa between 
Rater 1 and 

Rater 2 

Kappa between 
Rater 1 and 

Autocoder 

Kappa between 

Rater 2 and 

Autocoder 

Design Reasoning 

0.89** 

0.89* 

0.89** 

Performance Parameters 

0.89** 

1.00** 

0.89** 

Technical Constraints 

0.83** 

0.94** 

0.89** 

Client and Consultant 

1.00** 

1.00* 

1.00** 

Requests 

Collaboration 

1.00** 

1.00* 

1.00** 

Data 

0.9** 

0.87** 

0.89** 


Note: *rho < 0.05, **rho < 0.01 


We then performed a chronologically oriented representations of discourse and tool-related activity 
(CORDTRA) analysis (Hmelo-Silver, Liu, & Jordan, 2009) during one activity to show the temporal pattern 
of the six codes in student discourse. Researchers use CORDTRA diagrams as a visualization technique to 
reveal patterns in collaborative discourse. In a CORDTRA diagram, each horizontal line represents a 
code, each point on these lines represents an instance of a specific code, and the X-axis represents 
discourse units over time. 

3.2.2 Epistemic network analysis 

ENA models the structure of connections among engineering epistemic frame elements by quantifying 
the co-occurrences of codes within a stanza (Shaffer et al., 2009; Shaffer 2014). After defining the 
segmentation structure, ENA creates an adjacency matrix representing the co-occurrences of codes in 
each stanza. To construct an adjacency matrix, ENA assigns a one for each unique pair of codes that co¬ 
occur one or more times in those utterances, and a zero for each unique pair that does not co-occur in 
the stanza. ENA sums the adjacency matrices into a cumulative adjacency matrix, where each cell 
represents the number of stanzas (i.e., the number of adjacency matrices) in which that unique pair of 
codes was present. Each person's or team's collection of co-occurrences is thus represented by a 
cumulative adjacency matrix that summarizes the pattern of connections among codes. 
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ENA then converts the cumulative adjacency matrices into cumulative adjacency vectors that are 
projected into a high-dimensional space based on the co-occurrence of codes across segments. These 
cumulative adjacency vectors are normalized to control for the varying lengths of vectors by dividing 
each vector by its length; the resulting vector thus represents the relative frequency of co-occurrences. 
ENA then performs a singular value decomposition on the normalized vectors. This produces a rotation 
of the original high-dimensional space, such that the rotated space provides a reduced number of 
dimensions that capture the maximum variance in the data. 

The resulting models can be visualized as networks in which the nodes in the model are the codes and 
the lines connecting the nodes represent the co-occurrence of two codes. Thus, we can quantify and 
visualize the structure of connections among engineering design codes, making it possible to 
characterize student discourse during the virtual internship. 

3.3 Comparison of Segmentation Procedures 

In this study, we compared two methods of segmenting data for use in ENA: the conversation method 
and the moving stanza window method. For the conversation segmentation method, ENA created one 
adjacency matrix for each activity and then summed the matrices across the 11 activities for a given 
team. 

The moving stanza window method created a referent adjacency matrix for each utterance, known as 
the referring utterance. The referent adjacency matrix for each utterance was constructed from two 
types of co-occurrences of codes: 1) co-occurrences within the referring utterance, and 2) co¬ 
occurrences between the referring utterance and a specific number of previous utterances, known as 
the window. The moving window then moved to the next referring utterance and created the next 
referent adjacency matrix. This process continued until the end of the defined conversation and then 
ENA summed the matrices across all utterances for that unit. No windows were made across activities 
(conversations), only within them. Figure 1 shows how the conversation method and the moving stanza 
window method created different models of connectivity. 


Coded Data Moving Stanza Window Method Conversation Method 


11 slightly prioritized agility,|in order to meet 

| Design reasoning | 

| Design reasoning | 

| Design reasoning | 

Shawn Edwards' recommendations. 

\ 




1 chose light materials because 1 believe they 
would like something comfortable to wear 

\ 



v 

Yeah 1 agree 

\ 



v\ 

1 was reviewing^aulo Henriquez's requests|and 

| Consultant request | 

| Consultant request | 

1 

---1 

onsultant request | 

he was most focused on safety. 




ButJ recharge interval [nay be irritating ifltoo low | 

| Parameter | | Data | 

| Parameter | H Data | 


Rescue workers are concerned about safety 





(a) (b) (c) (d) 


Figure 1. Example of coded data from one activity (a). The moving stanza window method analyzes 
connections within the referring utterance and between the referring utterance and the window (b). 
After analyzing a window, the moving stanza method slides to the next utterance and repeats the 
process of finding connections within and between the referring utterance and the window (c). The 
conversation method analyzes all connections in an activity (d). 
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Co-occurrences of codes within or across non-referring utterances were not included in the referent 
adjacency matrix, which eliminated double-counting of connections when the cumulative adjacency 
matrix was computed. 

3.4 Comparison of Network Models 

To analyze the different segmentation methods using ENA, we created three models: 1) a conversation 
model for all teams in the sample, 2) a moving stanza window model with a window size of three for all 
teams in the sample, and 3) a moving stanza window model with a window size of three for all students 
in the sample, based on a qualitative analysis of the data that suggested most explicit connections 
between ideas in the discourse occurred within a span of 4 or fewer lines (the referring utterance plus 
the preceding three turns of talk). All three of these sets were projected into the dimensional reduction 
for the team moving stanza model so the resulting networks could be compared. To analyze the 
differences between the two segmentation methods, we chose a representative team and closely 
examined the discourse of one team. First, we examined the team's discourse and compared the 
conversation model with the moving stanza window model, then we examined individual contributions 
to the team's discourse and used a moving stanza window model. 

4 RESULTS 

For the purposes of this analysis, we examined the conversations of one representative student project 
team. The Hydraulic team had five team members: Arden, Connor, Margaret, Jimmy, and Jordan. We 
modelled their collaborative design work over the first 11 activities of the virtual internship, which 
included background research into principles of biomechanics, as well as the design, testing, and 
evaluation of an initial prototype for a robotic exoskeleton. 

4.1 Conversation and Moving Stanza Window Models for the Hydraulic Team 

We used both the conversation method and the moving stanza window method to model the discourse 
of the team. Both models (see Figure 2) show that the connections to and between technical constraints 
and design reasoning were prominent in the group's design discussions. This is represented by larger 
node sizes and thicker lines in the ENA network graph linking the nodes that correspond to those 
discourse elements. This is, of course, hardly surprising, as the group's primary goal was to choose 
appropriate design features (input constraints) to maximize the function of their device. 

However, the conversation method (Figure 2a) suggests that the Hydraulic team connected these 
features of design with explicit discussion of their collaboration process; in contrast, the moving stanza 
window method (Figure 2b) suggests that the team spent less time explicitly connecting talk about 
collaboration to their design work and more time linking the technical constraints and design reasoning 
to other elements of the problem space, representing explicit discussion about how to balance 
competing needs involved in the design process. 
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Figure 2. Network graphs of the Hydraulic team's discourse produced using (a) the conversation 
method and (b) the moving stanza window method. Thicker lines denote more frequent connections 
between codes. Percentages indicate the amount of variance explained by each dimension; in this 
analysis, 57% of the total variance is accounted for in this data set. 

This contrast is shown more clearly by computing the difference between the two network models 
(Figure 3). The difference between the network models is computed by subtracting the weight of each 
connection in one network from the corresponding weighted connection in the second network to 
obtain one network representation. Figure 3 shows a higher number of connections in the conversation 
method (red lines in the figure) to the node for collaboration, suggesting that links between the 
collaboration and other elements of the epistemic frame of engineering are a prominent feature of 
student discourse in this model. In contrast, the moving stanza window method (blue) suggests that 
students made more connections between the design elements of technical constraints, performance 
parameters, and design reasoning. 



Figure 3. Subtracted network of the Hydraulic team's discourse, in which blue connections occur more 
frequently with the moving stanza method and red connections occur more frequently with the 

conversation method. 
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4.2 Comparing Connections within a Single Conversation 

To further explore the differences between the two models of discourse, we examined the frequency of 
codes for each team within each conversation in the virtual internship. For example, when students met 
with their teammates to design devices, the discourse included references to the collaboration, which 
was one of the key differences between the two models. To understand why there was such a 
substantial difference in connections to collaboration, we examined patterns of code using a CORDTRA 
representation for this activity (Figure 4). 

The CORDTRA shows that students explicitly talked about collaboration only at the start and at the end 
of the activity. In the previous analysis, applying the conversation method to this activity produced 
connections between collaboration and codes that appeared at any point within the activity, even 
though the CORDTRA revealed that students only talked explicitly about collaboration at the beginning 
and the end of the discussion. 

In contrast, applying the moving stanza window produced connections between codes only if the codes 
co-occurred within recent temporal proximity; that is, within three utterances of the referring utterance. 
Thus, the moving stanza window model shows a less prominent role for collaboration. 



Utterance 

Figure 4. CORDTRA diagram of Hydraulic team discourse codes during one design activity. 

4.3 Contrasting Connections between Individuals 

A second consideration in comparing the conversation method and the moving stanza window method 
is that the conversation method suffers from the same limitation as many extant techniques for 
modelling CSCL (e.g., CORDTRA): it can model a group conversation, but it does not effectively model the 
participation of one individual in the context of a group discussion. The moving stanza window method, 
in contrast, can account for this important component of collaborative learning. 
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The reason for this difference is that the conversation method uses a single adjacency matrix to model 
each activity, and that matrix incorporates the contributions of all members of the group. There is thus 
no method for disentangling the contribution of any one individual. In contrast, the moving stanza 
window method models each utterance as an adjacency matrix, showing the connections one adjacency 
matrix (or one individual) contributes to the group discourse. As a result, we can use the moving stanza 
window method to examine the connections that each individual makes to the collaborative discussion 
of the group. 

In this study, we modelled the contributions of two students, Jimmy and Connor, to the Hydraulic 
team's discussion. We constructed a network model of each of the two students' contributions, where 
each model included only those stanza windows in which the referring utterance belonged to that 
individual (Figure 5). These models thus represent the unique contributions to the team discussion made 
by each student. 



Figure 5. Moving stanza window model for Jimmy's (a) and Connor's (b) discourse. Thicker lines 
denote more frequent connections between discourse codes. 

The networks using a moving stanza window method show that across all eleven activities or 
conversation, Connor's and Jimmy's individual contributions to the group discourse differ. This contrast 
is shown more clearly by computing the difference between the two individual network models (Figure 
6). Figure 6 shows a higher number of connections in Connor's talk (green lines in the figure) between 
constraints and performance parameters, suggesting that Connor frequently made connections 
between the more technical attributes and inputs of the design problem. In contrast, Jimmy made more 
connections between data and design reasoning in the design discussion. 
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Figure 6. Subtracted network of Connor's and Jimmy's discourse, in which green connections occur 
more frequently in Connor's talk and purple connections occur more frequently in Jimmy's talk. 


Table 3 illustrates this difference in a short excerpt from one of the group's discussions about 
interpreting experimental data. In this excerpt, Jimmy discussed design trade-offs and, in Jimmy's 
second comment (Line 2), he made a connection between data and design reasoning. He argued that 
graphs showed the results of benchmark testing (data) help the team make an "informed decision" 
(design reasoning) about their design choices. Two turns of talk later (Line 4), Connor added to the 
discussion by introducing information about specific attributes and inputs of the design: the 
performance parameters (payload, agility, and battery life) of some of the design choices that the team 
is considering (cadmium batteries and piezoelectric sensors), which connects to Jimmy's design 
reasoning comments. 

Table 3. Brief Excerpt of the Hydraulic Team's Discussion of Findings during the Graphing Activity 
Student Chat Utterance Code 


1 Jimmy They all had both advantages and disadvantages. There was 
no "obvious" best choice. 


Design Reasoning 


2 Jimmy 


The graphs indicated the properties of all the different 
options and made a comparable visual illustration to make an 
informed decision on which combination to use. 


Data, Design 
Reasoning 


3 Jordan 


The graphs detailed what aspects of power sources and Data, Technical 

control sensors are important — namely, the numerical data. Constraints 


4 Connor 


I suggested using cadmium batteries with piezoelectric 
sensors; together they make a strong combination of payload 
and agility while keeping costs in a moderate range and 
having strong battery life. 


Technical Constraints, 
Performance 
Parameters, Data 
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This model using the moving stanza window method show that Connor builds on Jimmy's discussion 
about data and design reasoning by contributing information about technical constraints and 
performance parameters. The moving stanza window method separately modelled both Jimmy's original 
contributions to the team discussion and the fact that Connor's contribution built on Jimmy's utterance 
two lines before. 

5 DISCUSSION 

Our results suggest that the conversation method and the moving stanza window method identified 
different patterns of connection-making in student discourse. In particular, the conversation method 
summarized the connections made by student teams based on activity, but it did not identify individual 
contributions to team discussions. The moving stanza window method, in contrast, accounted for the 
connections that were made based on activity and temporal proximity; importantly, this method was 
also able to model the contributions of individual students to team conversations. 

Of course, which of these models is the most appropriate depends on the theory of discourse being 
modelled and the assumptions of collaborative discourse. For example, if we assume that talk at the 
beginning of an activity frames everything that follows — or similarly, if talk at the end of an activity 
builds on everything that preceded it — then the conversation method is appropriate, because it models 
connections among all of the talk within a single activity. If, on the other hand, we assume that 
connections are sensitive to the temporal proximity of talk, then the moving stanza window method is a 
better choice, as this approach models connections locally within an activity such that very early turns of 
talk are not related to ideas that arise much later in the discussion. 

An additional benefit of the moving stanza window method is that it also models the role of individual 
contributions to group discussions. By sliding a fixed number of lines across a dataset and defining a 
stanza for each line of chat, researchers can update the models of discourse after each chat. Therefore, 
moving stanza window ENA can make real time updates to the individual and group models of discourse 
each time a student chats in a virtual discussion. Many CSCL environments already include integrated 
feedback and assessment; however, the ability to model individual contributions to group discussions in 
a chat's recent temporal context would allow teachers the ability to assess real-time student 
performance in online environments (Shaffer, 2017). 

In future work, the moving stanza window method could help researchers develop tools to support 
teacher use of learning analytic models within CSCL environments. Using this method, we could develop 
embedded assessments that automatically analyze student chat discourse to measure if students make 
certain connections between key elements during specific activities. By creating a predetermined set of 
core connections, we could create a network diagram of student learning that compares student and 
group connection-making with the target connections for that activity. Teachers could then use such 
models to monitor and support student achievement of learning outcomes as individuals and as teams. 
If students were not discussing key conceptual connections, the tool could suggest just-in-time 
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interventions that are specific, actionable, and based on student networks. Currently, we are developing 
a teacher interface tool that shows ENA models of student and group discussions in real time, allowing 
teachers to see what connections students make, or do not make, while engaging in our virtual 
internships (Shaffer, 2017). 

This study, of course, is limited in that it focused on the activities of one group of students working in 
one CSCL context. The goal of this study was to provide an example of how two different segmentation 
techniques provided different models of discourse. By focusing on one team, we were able to go into 
richer detail about how an individual student contributed ideas in the context of other teammates' 
discussion. Of course, future analyses could dive deeper into the other groups in the sample or use the 
moving stanza window method on other data. Additionally, it is important to determine what sliding 
window size is most appropriate for different analyses (Graesser, Dowell, Clewley, & Shaffer, in press) 
and we are investigating how to determine the appropriate window size that identifies the recent 
temporal context for a given learning environment (Shaffer, 2017). 

However, this work empirically highlights a key theoretical distinction between models of connectivity in 
discourse, and perhaps more importantly, it demonstrates that the moving stanza window method 
makes it possible to use ENA to model both group discourse and the contributions of individuals to the 
group within a CSCL context. 
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